The bayesynergy R package implements a Bayesian
semi-parametric regression model for estimating the dose-response
function of in-vitro drug combination experiments. The Bayesian
framework offers full uncertainty quantification of the dose response
function and any derived summary statistics, as well as natural handling
of replicates and missing data. The Bayesian model is implemented in
Stan (Stan Development Team (2020)),
taking advantage of the efficient ‘No U-Turn Sampler’ as well as
variational Bayes for quick approximations of the true posterior.
The package is further equipped with plotting functions for summarizing a drug response experiment, parallel processing for large drug combination screen, as well as plotting tools for summarizing and comparing these.
The dose-response function \(f:\boldsymbol{x} \to (0,1)\), maps drug concentrations \(\boldsymbol{x}\) to a measure of cell viability – zero corresponding to all cells being dead after treatment, one corresponding to all cells still alive. In drug-combination screens, it is common to assume that the dose-response function can be broken down as \[ f(\boldsymbol{x}) = p_0(\boldsymbol{x})+\Delta(\boldsymbol{x}), \] where \(p_0(\boldsymbol{x})\) encodes a non-interaction assumption, and \(\Delta(\boldsymbol{x})\) captures the residual interaction effect.
The non-interaction assumption, \(p_0(\boldsymbol{x})\), captures what can be reasonably assumed about a joint drug effect, given estimates of the drugs’ individual effect. We assume a Bliss style independence assumption, where we first assume that the individual drugs’ dose-response function takes the form of a log-logistic curve \[ h_i(x_i|l,s,m) = l + \frac{1-l}{1+10^{s(x_i-m)}}, \] where \(l\) is the lower-asymptote, \(s\) the slope, and \(m\) the drugs ‘EC-50’ on the \(\log_{10}\) scale. The Bliss assumption then amounts to a probabilistic independence assumption, where \[ p_0(\boldsymbol{x}) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \] We call it probabilistic, because we can interpret the individual dose-response curves, \(h_i()\) as probability of cell survival. Defining the events \[ \begin{align} A_i & = \text{A cell survives drug A at concentration $x_{1i}$} \\ B_j & = \text{A cell survives drug B at concentration $x_{2j}$} \\ C_{ij} & = \text{A cell survives both drugs at concentration $\boldsymbol{x}=(x_{1i},x_{2j})$}, \end{align} \] the corresponding probabilities become \[ p_0(\boldsymbol{x}) = P(C_{ij}) = P(A_i)P(B_i) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \]
The interaction component, \(\Delta(\boldsymbol{x})\), captures any joint effect of the drugs that is not captured by the non-interaction assumption. If two drugs are more effective together than it would be expected by \(p_0\), we call it synergy, which corresponds to \(\Delta <0\). The opposite effect is deemed antagonism.
Because the interaction landscape can be complex, with multiple local peaks and valleys, we model this term non-parametrically using a Gaussian Process prior (GP). To ensure that the resulting dose-response function only takes values in the interval \((0,1)\), we push the GP through a transformation function \(g()\). That is \[ z(\boldsymbol{x}) \sim \mathcal{GP}(0,\kappa(\boldsymbol{x},\boldsymbol{x}')) \\ \Delta(\boldsymbol{x}) = g(z(\boldsymbol{x})), \] where the transformation function looks like \[ g(z(\boldsymbol{x})) = \frac{-p_0(\boldsymbol{x})}{1+\exp\left\{b_1z(\boldsymbol{x})+\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} + \frac{1-p_0(\boldsymbol{x})}{1+\exp\left\{-b_2z(\boldsymbol{x})-\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}}. \] In addition to ensuring the proper bounds for the dose-response function, this transformation has the feature of \(g(0)=0\), which corresponds to an a priori assumption that \[ \mathbb{E}\left[f(\boldsymbol{x}) | p_0(\boldsymbol{x})\right] \approx p_0(\boldsymbol{x}). \] That is, we make our non-interaction assumption into a formal prior expectation on the dose-response function. This achieves two things, (1) a slightly conservative model that needs to be convinced that interaction effects are present, and (2) no built-in bias of interaction in the prior structure.
The covariance function \(\kappa(\boldsymbol{x},\boldsymbol{x}')\) can be given multiple specifications, including a squared exponential, Matérn, and Rational Quadratic covariance functions. By default, we use a Matérn covariance with the \(\nu\) parameter set to 3/2 yielding \[ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2\left(1+\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right)\exp\left\{-\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right\}. \] Finally, by utilizing the natural grid structure of the drug concentrations, we can write the kernel function as \[ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2 \kappa(x_1,x_1')\kappa(x_2,x_2'), \] which induces a Kronecker product structure on the final covariance matrix. Following the implementation detailed in Flaxman et al. (2015), this greatly improves the computational efficiency of the model.
Given the above formulation for the dose-response function \(f\), we assume that we have access to noisy observations from it. These observations are typically generated from various cellular assays, e.g. viability assays. In particular we assume that given concentration points \(\boldsymbol{x}_1,\ldots,\boldsymbol{x}_n\) we have observations \(y_1,\ldots,y_n\) where \[ y_i = f(\boldsymbol{x}_i) + \epsilon_i, \] where we assume that the errors \(\epsilon_i\) are normally distributed with mean zero. For the variance of the observational errors, by default we model these in a heteroscedastic fashion as \[ \text{Var}\left[\epsilon_i\right] = \sigma^2(f(\boldsymbol{x}_i)+\lambda), \] where \(\lambda\) is set to a small value to handle the case when \(f = 0\), but there is still some residual noise. In a typical setup where cell viability is calculated through a normalization to positive and negative controls, lambda can be empirically set as \[ \lambda = \frac{\sigma^2_{+}}{\sigma^2_{-}}, \] where \(\sigma^2_{+}\) and \(\sigma^2_{-}\) denotes the variance of positive and negative controls, respectively.
We choose a heteroscedastic model by default, because in cell viability assays, the observations are normalized in relation to positive and negative controls. The positive controls typically have much lower variance compared to the negative controls, which translates to viability measures closer to zero being more precisely measured. We also allow homoscedastic noise as an option.
The positive and negative controls essentially control the signal-to-noise ratio in cell viability assays. If the user has access to these, they can be included in the model to help calibrate the posterior distribution – particularly in the case with zero replicates.
Let \(\xi^-_k\) and \(\xi^+_l\) denote the negative and positive controls for \(k=1,\ldots,n_-\) and \(l=1,\ldots,n_+\). These measurements are raw readings from the plate and are used to calculate cell viability. For an additional well, treated with drug concentration \(\mathbf{x}_i\), we denote the raw output by \(\xi_i\), and calculate cell viability for this well by the formula: \[ y_i = \frac{\xi_i-\tilde{\xi^+}}{\tilde{\xi^-}-\tilde{\xi^+}}, \] where \(\tilde{\xi^-}\) and \(\tilde{\xi^+}\) denotes some measure of centrality of the positive and negative controls, typically the mean or median.
The controls can themselves be passed through this function and converted to % viability. From the variances of these normalized controls, \(\lambda\) can be set as indicated above. And the negative controls can be added directly into the algorithm. Negative controls represents unhindered cell growth, and can be thought of as samples from the dose-response function \(f(\mathbf{x})\) at concentration \(\mathbf{x}=(0,0)\). These can then be added directly to the \(\texttt{bayesynergy}\) function in the same way as regular observations.
The full model specification, with all default prior distributions look like \[ y_i \sim \mathcal{N}\left(f(\boldsymbol{x}_i),\sigma^2(f(\boldsymbol{x}_i)+\lambda)\right), \ i = 1,\ldots, n \\ \sigma \sim \text{Inv-Ga}\left(5,1\right), \ \lambda = 0.005. \\ f(\boldsymbol{x}_i) = p_0(\boldsymbol{x}_i)+\Delta(\boldsymbol{x}_i) \mathbb{I}(10^{\boldsymbol{x}_i}>0) \\ p_0(\boldsymbol{x}) = h_1(x_1|l_1,s_1,m_1) \ h_2(x_2|l_2,s_2,m_2). \\ l_j = \text{Beta}(1,1.25), \ s_i \sim \text{Gamma}(1,1), \\ m_i \sim \mathcal{N}(\theta_i,\sigma_{m_i}^2), \ j = 1,2 \\ \theta_i \sim \mathcal{N}(0,1), \ \sigma_{m_i}^2 \sim \text{Inv-Ga}\left(3,2\right), \ j = 1,2 \\ \Delta(\boldsymbol{x}) = g(z(\boldsymbol{x})), \ z(\boldsymbol{x}) \sim \mathcal{GP}(0,\kappa(\boldsymbol{x},\boldsymbol{x}')) \\ g(z(\boldsymbol{x})) = \frac{-p_0(\boldsymbol{x})}{1+\exp\left\{b_1z(\boldsymbol{x})+\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} + \frac{1-p_0(\boldsymbol{x})}{1+\exp\left\{-b_2z(\boldsymbol{x})-\log\left[\frac{p_0(\boldsymbol{x})}{1-p_0(\boldsymbol{x})}\right]\right\}} \\ \kappa(\boldsymbol{x},\boldsymbol{x}') = \sigma_f^2\left(1+\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right)\exp\left\{-\frac{\sqrt{3}\Vert\boldsymbol{x}-\boldsymbol{x}'\Vert}{\ell}\right\}, \\ \sigma_f^2 \sim \text{log-}\mathcal{N}(1,1), \ \ell \sim \text{Inv-Ga}(5,5) \\ b_j \sim \mathcal{N}(1,0.1^2), \ j = 1,2. \] Note that some of these specifications can be altered. For example, by default we estimate the lower asymptotes, but they can also be fixed equal to zero.
In the model specification above, the interaction term is multiplied with an indicator function \(\mathbb{I}(\boldsymbol{x}>0)\) taking the value 1 if and only if all elements in \(\boldsymbol{x}\) is strictly larger than zero. This makes sure that we don’t allow for interaction when one of the drugs is at zero concentration.
From the posterior dose-response function \(f | \mathbf{y}\), we derive a number of summary statistics concerning efficacy, synergy and antagonism.
For the monotherapy curves, we produce estimates of the drug sensitivity score (DSS) of each drug by the integral
\[ DSS_0 = \int_a^b 1-h_j(x) \text{d}x, \] where \(a=\min(x_{1j})\) and \(b=\max(x_{1j})\). That is, the integral is taken from the measured dose range of the drug in question. This is in contrast to how the regular DSS score is calculated, where integration starts where the mono-therapy crosses the 90% viability threshold. This is done to better separate true effects from background noise, but since this is handled here through sampling, we don’t need it. The DSS value is further standardized by the total volume available for drug efficacy, \[ DSS = \frac{DSS_0}{(b-a)} \] From here, values can be further standardized as in Yadav et al. (2014).
To summarise the combined drug-response function, we utilise the measures developed in Cremaschi et al. (2019). The basic building block is the ‘volume under the surface’ or VUS, for which the general integral looks like
\[ VUS_0(f) = \int_a^b \int_c^d f(\mathbf{x}) \ \text{d}\mathbf{x}, \] and the integrals are taken over the observed drug range, i.e. \(a = \min (x_1)\), \(b = \max (x_1)\), \(c = \min (x_2)\), \(d = \max (x_2)\). This is then standardised to obtain a value between zero and 100, \[ VUS(f) = \frac{VUS_0(f)}{(b-a)(d-c)}. \] Furthermore, to make this into an overall measure of efficacy, we define the residual VUS (rVUS) by
\[ rVUS(f) = 100 - VUS(f), \] which makes this value more comparable with the DSS values, where a higher number now indicates a larger efficacy of the drug combination.
The model calculates \(rVUS\) for the dose-response function \(f\), giving a measure of combined efficacy. In addition, we calculate \(rVUS(p_0)\), the non-interaction efficacy. This makes it possible to separate how much of the total efficacy that can be attributed to the non-interaction assumption. For the interaction term, we simply compute the VUS values e.g. \(VUS(\Delta)\) for the interaction efficacy. For the interaction term \(\Delta\), we also compute \(VUS(\Delta^{-})\) and \(VUS(\Delta^{+})\) for synergy and antagonism, where \(\Delta^{+}\) and \(\Delta^{-}\) denotes the positive and negative parts of \(\Delta\), respectively. That is,
\[ \Delta^{+}(\mathbf{x}) = \max(0,\Delta(\mathbf{x})) \\ \Delta^{-}(\mathbf{x}) = \min(0,\Delta(\mathbf{x})). \] We compute these measures because, frequently, the interaction surface contains both antagonistic and synergistic regions. When taking the average across the whole surface, an antagonistic outlier might cancel an otherwise strong synergistic effect.
When running screens with a large amount of drug combinations, it is helpful to have a normalised measure for comparing synergy across experiments. The \(rVUS\) scores defined above are already standardized to their drug concentration range, but to compare across experiments, we also standardize with respect to the uncertainty in the model. To do this, we calculate a synergy score by normalizing \(rVUS(\Delta^{-})\) with respect to its standard deviation. \[ \text{Synergy score} = \frac{\text{mean}(VUS(\Delta^{-}))}{\text{sd}(VUS(\Delta^{-}))}. \]
Frequently, it is of interest to classify an experiment as synergistic or antagonistic. Usually, this has been done by thresholding the synergy measure at a certain level, declaring e.g. everything above 10 as synergistic, everything below -10 antagonistic, and anything in between as additive (no interaction). The problem with this is that it completely ignores the underlying measurement error, and as a consequence the thresholding procedure can lead to misclassification. Large synergistic effects might be classified as synergistic, but in reality the effect cannot be discerned from the background noise. In the same manner, genuine synergistic effects that are too small, for example because the dose-ranges are a bit off, will also be misclassified. By incorporating the uncertainty into the classification it can be done in a more principled manner.
In Bayesian inference, we can compute what is know as the model evidence. That is, given a probabilistic model \(\mathcal{M}\), and some data we think is generated from it, \(\mathcal{D}\), the evidence is defined as the probability of the model given the data, \(P(\mathcal{M} \vert \mathcal{D})\). We can use this quantity to compare different models, in particular when comparing two distinct models we can define the Bayes Factor, \(\text{BF}_{10}\): \[ \text{BF}_{10}=\frac{P(\mathcal{D}\vert\mathcal{M}_1)}{P(\mathcal{D}\vert\mathcal{M}_0)} = \frac{P(\mathcal{M}_1 \vert \mathcal{D})}{P(\mathcal{M}_0 \vert \mathcal{D})}\frac{P(\mathcal{M}_1)}{P(\mathcal{M}_0)}, \] where \(P(\mathcal{M}_1)\) and \(P(\mathcal{M}_0)\) denotes the prior model probabilities. By defining \[ \mathcal{M}_0: f(\mathbf{x}) = p_0(\mathbf{x}) \\ \mathcal{M}_1: f(\mathbf{x}) = p_0(\mathbf{x}) + \Delta(\mathbf{x}), \] and computing \(\text{BF}_{10}\), the Bayes factor gives information on whether the interaction surface needs to be included in the model. A high value indicates that \(\mathcal{M}_1\) is preferred over \(\mathcal{M}_0\), and thus that there most likely is some interaction in the experiment. One still needs to make a cutoff, but it will be less arbitrary by connecting it directly to the uncertainty in the model, and model evidence. The thresholding itself can be done according to e.g. the table in Kass and Raftery (1995):
| \(\text{BF}_{10}\) | Evidence against \(\mathcal{M}_0\) |
|---|---|
| 1 to 3.2 | Not worth more than a bare mention |
| 3.2 to 10 | Substantial |
| 10 to 100 | Strong |
| >100 | Decisive |
The Bayes factor only gives information about whether or not an interaction is present. Depending on the classification task, one still needs to decide if the effect is synergistic or antagonistic. For this one could e.g. use the integral of the interaction surface, \(\text{VUS}(\Delta)\), if this is negative the experiment is coded as synergistic, if positive it is coded as antagonistic.
The calculation of the Bayes factor is implemented directly in the
bayesynergy function, and can be calculated simply by
adding bayes_factor = T to the call. Model evidence and the
Bayes factor itself is computed via the bridgesampling
package (Gronau, Singmann, and Wagenmakers
(2020)).
In the R package, we’ve attached two example datasets from a large drug combination screening experiment on diffuse large B-cell lymphoma. We’ll use these to show some simple use cases of the main functions and how to interpret the results.
Let’s load in the first example and have a look at it
library(bayesynergy)
data("mathews_DLBCL")
y = mathews_DLBCL[[1]][[1]]
x = mathews_DLBCL[[1]][[2]]
head(cbind(y,x))## Viability ibrutinib ispinesib
## [1,] 1.2295618 0.0000 0
## [2,] 1.0376006 0.1954 0
## [3,] 1.1813851 0.7812 0
## [4,] 0.5882688 3.1250 0
## [5,] 0.4666700 12.5000 0
## [6,] 0.2869514 50.0000 0
We see that the the measured viability scores are stored in the
vector y, while x is a matrix with two columns
giving the corresponding concentrations where the viability scores were
read off.
Fitting the regression model is simple enough, and can be done on default settings simply by running the following code (where we add the names of the drugs involved, the concentration units for plotting purposes, and calculate the bayes factor).
fit = bayesynergy(y,x, drug_names = c("ibrutinib", "ispinesib"),
units = c("nM","nM"),bayes_factor = T)##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 0.000163 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.63 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 2.82966 seconds (Warm-up)
## Chain 1: 2.6061 seconds (Sampling)
## Chain 1: 5.43576 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 4.8e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.48 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 2.77396 seconds (Warm-up)
## Chain 2: 1.70938 seconds (Sampling)
## Chain 2: 4.48334 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 7.6e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.76 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 3.49013 seconds (Warm-up)
## Chain 3: 2.74635 seconds (Sampling)
## Chain 3: 6.23648 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'gp_grid' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 6.1e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.61 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 2.72047 seconds (Warm-up)
## Chain 4: 2.0383 seconds (Sampling)
## Chain 4: 4.75876 seconds (Total)
## Chain 4:
##
## SAMPLING FOR MODEL 'nointeraction' NOW (CHAIN 1).
## Chain 1:
## Chain 1: Gradient evaluation took 4.2e-05 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.42 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1:
## Chain 1: Elapsed Time: 0.376352 seconds (Warm-up)
## Chain 1: 0.242506 seconds (Sampling)
## Chain 1: 0.618858 seconds (Total)
## Chain 1:
##
## SAMPLING FOR MODEL 'nointeraction' NOW (CHAIN 2).
## Chain 2:
## Chain 2: Gradient evaluation took 2.6e-05 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.26 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2:
## Chain 2:
## Chain 2: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 0.38833 seconds (Warm-up)
## Chain 2: 0.294654 seconds (Sampling)
## Chain 2: 0.682984 seconds (Total)
## Chain 2:
##
## SAMPLING FOR MODEL 'nointeraction' NOW (CHAIN 3).
## Chain 3:
## Chain 3: Gradient evaluation took 2.2e-05 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.22 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3:
## Chain 3:
## Chain 3: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 3:
## Chain 3: Elapsed Time: 0.387425 seconds (Warm-up)
## Chain 3: 0.294108 seconds (Sampling)
## Chain 3: 0.681533 seconds (Total)
## Chain 3:
##
## SAMPLING FOR MODEL 'nointeraction' NOW (CHAIN 4).
## Chain 4:
## Chain 4: Gradient evaluation took 1.9e-05 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4:
## Chain 4:
## Chain 4: Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4: Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4: Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4: Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4: Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4: Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4: Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4: Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4: Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4: Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4: Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4: Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 4:
## Chain 4: Elapsed Time: 0.396576 seconds (Warm-up)
## Chain 4: 0.4342 seconds (Sampling)
## Chain 4: 0.830776 seconds (Total)
## Chain 4:
## Calculating the Bayes factor
The resulting model can be summarised by running
summary(fit)## mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
## la_1[1] 0.3348 0.002131 0.0741 1.57e-01 3.45e-01 0.455 1207.5 1.001
## la_2[1] 0.3844 0.007323 0.0710 9.53e-02 3.97e-01 0.455 94.1 1.037
## log10_ec50_1 0.4807 0.004462 0.1564 2.39e-01 4.50e-01 0.863 1229.2 1.000
## log10_ec50_2 -1.0006 0.022637 1.0138 -3.31e+00 -8.51e-01 0.504 2005.6 1.002
## slope_1 2.0134 0.020333 0.8961 8.78e-01 1.81e+00 4.296 1942.3 1.000
## slope_2 1.4893 0.034577 1.1058 8.24e-02 1.23e+00 4.314 1022.8 1.002
## ell 3.0452 0.034864 1.4762 1.28e+00 2.72e+00 6.681 1792.9 1.001
## sigma_f 0.8431 0.016682 0.7972 1.71e-01 6.11e-01 2.915 2283.8 1.000
## s 0.0966 0.000264 0.0147 7.32e-02 9.46e-02 0.130 3111.4 1.001
## dss_1 33.4492 0.044352 2.9510 2.75e+01 3.35e+01 39.134 4426.9 1.000
## dss_2 59.4242 0.044896 2.8232 5.38e+01 5.95e+01 64.692 3954.2 1.000
## rVUS_f 82.7605 0.013293 0.8630 8.10e+01 8.28e+01 84.473 4214.9 1.000
## rVUS_p0 73.0108 0.033918 2.2650 6.85e+01 7.31e+01 77.171 4459.5 0.999
## VUS_Delta -9.7497 0.034910 2.4208 -1.47e+01 -9.66e+00 -5.313 4808.5 1.000
## VUS_syn -9.7978 0.034389 2.3759 -1.47e+01 -9.68e+00 -5.481 4773.5 1.000
## VUS_ant 0.0481 0.001926 0.1124 5.10e-06 8.22e-05 0.402 3404.8 1.000
##
## log-Pseudo Marginal Likelihood (LPML) = 52.53337
## Estimated Bayes factor in favor of full model over non-interaction only model: 27.48539
which gives posterior summaries of the parameters of the model. In
addition, the model calculates summary statistics of the monotherapy
curves and the dose-response surface including drug sensitivity scores
(DSS) for the two drugs in question, as well as the volumes that capture
the notion of efficacy (rVUS_f), interaction
(VUS_Delta), synergy (VUS_syn) and interaction
(VUS_ant).
As indicated, the total combined drug efficacy is around 80%
(rVUS_f), of which around 70 percentage points can be
attributed to \(p_0\)
(rVUS_p0), leaving room for 10 percentage points worth of
synergy (VUS_syn). We can also note that the model is
fairly certain of this effect, with a 95% credible interval given as
(-14.703, -5.481). The certainty of this is also verified by the Bayes
factor, which at 27.49 indicates strong evidence of an interaction
effect present in the model.
We can also create plots by simply running
plot(fit, plot3D = F)which produces monotherapy curves, monotherapy summary statistics, 2D contour plots of the dose-response function \(f\), the non-interaction assumption \(p_0\) and the interaction \(\Delta\). The last plot displays the \(rVUS\) scores as discussed previously, with corresponding uncertainty.
The package can also generate 3D interactive plots by setting
plot3D = T. These are displayed as following using the
plotly library (Plotly Technologies Inc.
(2015)).
The synergyscreen provides a work flow for data from big
drug combination screens, where multiple drugs are tested in combination
on multiple cell lines. It takes as input a list of experiments, each
entry being a list containing the necessary elements needed for a call
to the main regression function bayesynergy.
Included in the package is the result of a synergyscreen
run of 583 drug combinations on the A-375 human melanoma cell line from
ONeil et al. (2016). The
synergyscreen object is a list with two entries, a
dataframe with parameter estimates from each experiment, and a list
entitled failed – containing experiments that either failed
completely to process, or had an unsatisfactory fit.
data("ONeil_A375")
length(ONeil_A375$failed)## [1] 2
We see that the dataset has two experiments that failed to process,
during an initial run of synergyscreen. There’s a multitude
of reasons why an experiment might fail to process, it could be an input
error, initialization problems or problems with the parallel
processing.
The entries of failed are themselves lists, each
containing the necessary information to process through the
bayesynergy function
failed_experiment = ONeil_A375$failed[[1]]
names(failed_experiment)## [1] "y" "x" "drug_names" "experiment_ID"
head(cbind(failed_experiment$y,failed_experiment$x))## viability L778123 MK-4827
## [1,] 0.759 0.325 0.223
## [2,] 0.755 0.325 0.775
## [3,] 0.548 0.325 2.750
## [4,] 0.307 0.325 10.000
## [5,] 0.787 0.800 0.223
## [6,] 0.820 0.800 0.775
We can rerun experiments that failed to process, by simply passing
the returned synergyscreen object back into the function.
Note that we turn of the default options of saving each fit and plotting
everything, and set method = "vb" indicating we use
variational inference to fit the model.
fit_screen = synergyscreen(ONeil_A375, save_raw = F, save_plots = F, parallel = F,
bayesynergy_params = list(method = "vb"))## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000387 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 3.87 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Iteration: 250 / 250 [100%] (Adaptation)
## Chain 1: Success! Found best value [eta = 0.1].
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 -137.216 1.000 1.000
## Chain 1: 200 -90.691 0.757 1.000
## Chain 1: 300 -31.833 1.121 1.000
## Chain 1: 400 -16.813 1.064 1.000
## Chain 1: 500 8.048 1.469 1.000
## Chain 1: 600 29.445 1.345 1.000
## Chain 1: 700 44.126 1.201 0.893
## Chain 1: 800 52.628 1.071 0.893
## Chain 1: 900 62.217 0.969 0.727
## Chain 1: 1000 79.661 0.894 0.727
## Chain 1: 1100 79.555 0.794 0.513 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1200 90.880 0.755 0.333 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1300 95.238 0.575 0.219 MAY BE DIVERGING... INSPECT ELBO
## Chain 1: 1400 101.273 0.491 0.162
## Chain 1: 1500 110.030 0.190 0.154
## Chain 1: 1600 119.706 0.126 0.125
## Chain 1: 1700 129.962 0.101 0.081
## Chain 1: 1800 135.372 0.088 0.080
## Chain 1: 1900 142.390 0.078 0.079
## Chain 1: 2000 145.955 0.058 0.060
## Chain 1: 2100 150.294 0.061 0.060
## Chain 1: 2200 150.411 0.049 0.049
## Chain 1: 2300 153.116 0.046 0.049
## Chain 1: 2400 154.017 0.041 0.040
## Chain 1: 2500 157.363 0.035 0.029
## Chain 1: 2600 158.324 0.027 0.024
## Chain 1: 2700 156.021 0.021 0.021
## Chain 1: 2800 159.446 0.019 0.021
## Chain 1: 2900 156.957 0.016 0.018
## Chain 1: 3000 159.206 0.015 0.016
## Chain 1: 3100 158.777 0.012 0.015
## Chain 1: 3200 160.953 0.013 0.015
## Chain 1: 3300 159.286 0.013 0.014
## Chain 1: 3400 162.648 0.014 0.015
## Chain 1: 3500 161.060 0.013 0.014
## Chain 1: 3600 162.884 0.013 0.014
## Chain 1: 3700 163.358 0.012 0.014
## Chain 1: 3800 162.458 0.011 0.011
## Chain 1: 3900 162.783 0.009 0.010 MEAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
## Chain 1: ------------------------------------------------------------
## Chain 1: EXPERIMENTAL ALGORITHM:
## Chain 1: This procedure has not been thoroughly tested and may be unstable
## Chain 1: or buggy. The interface is subject to change.
## Chain 1: ------------------------------------------------------------
## Chain 1:
## Chain 1:
## Chain 1:
## Chain 1: Gradient evaluation took 0.000203 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 2.03 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1:
## Chain 1:
## Chain 1: Begin eta adaptation.
## Chain 1: Iteration: 1 / 250 [ 0%] (Adaptation)
## Chain 1: Iteration: 50 / 250 [ 20%] (Adaptation)
## Chain 1: Iteration: 100 / 250 [ 40%] (Adaptation)
## Chain 1: Iteration: 150 / 250 [ 60%] (Adaptation)
## Chain 1: Iteration: 200 / 250 [ 80%] (Adaptation)
## Chain 1: Iteration: 250 / 250 [100%] (Adaptation)
## Chain 1: Success! Found best value [eta = 0.1].
## Chain 1:
## Chain 1: Begin stochastic gradient ascent.
## Chain 1: iter ELBO delta_ELBO_mean delta_ELBO_med notes
## Chain 1: 100 -19.251 1.000 1.000
## Chain 1: 200 65.855 1.146 1.292
## Chain 1: 300 81.429 0.828 1.000
## Chain 1: 400 99.174 0.666 1.000
## Chain 1: 500 100.229 0.535 0.191
## Chain 1: 600 112.051 0.463 0.191
## Chain 1: 700 115.495 0.401 0.179
## Chain 1: 800 114.997 0.352 0.179
## Chain 1: 900 118.868 0.316 0.106
## Chain 1: 1000 69.986 0.354 0.179
## Chain 1: 1100 117.826 0.295 0.179
## Chain 1: 1200 124.654 0.171 0.106
## Chain 1: 1300 116.066 0.159 0.074
## Chain 1: 1400 127.829 0.151 0.074
## Chain 1: 1500 133.973 0.154 0.074
## Chain 1: 1600 108.241 0.168 0.074
## Chain 1: 1700 132.456 0.183 0.092
## Chain 1: 1800 132.884 0.183 0.092
## Chain 1: 1900 132.526 0.180 0.092
## Chain 1: 2000 136.678 0.113 0.074
## Chain 1: 2100 134.877 0.074 0.055
## Chain 1: 2200 137.507 0.070 0.046
## Chain 1: 2300 137.841 0.063 0.030
## Chain 1: 2400 141.118 0.056 0.023
## Chain 1: 2500 138.944 0.053 0.019
## Chain 1: 2600 139.658 0.030 0.016
## Chain 1: 2700 142.104 0.013 0.016
## Chain 1: 2800 137.320 0.016 0.017
## Chain 1: 2900 146.252 0.022 0.019
## Chain 1: 3000 140.450 0.023 0.019
## Chain 1: 3100 143.051 0.024 0.019
## Chain 1: 3200 145.838 0.024 0.019
## Chain 1: 3300 145.853 0.024 0.019
## Chain 1: 3400 148.668 0.023 0.019
## Chain 1: 3500 140.638 0.027 0.019
## Chain 1: 3600 146.250 0.031 0.035
## Chain 1: 3700 144.928 0.030 0.035
## Chain 1: 3800 140.715 0.029 0.030
## Chain 1: 3900 143.588 0.025 0.020
## Chain 1: 4000 147.081 0.023 0.020
## Chain 1: 4100 142.744 0.025 0.024
## Chain 1: 4200 147.745 0.026 0.030
## Chain 1: 4300 150.514 0.028 0.030
## Chain 1: 4400 148.121 0.028 0.030
## Chain 1: 4500 146.775 0.023 0.024
## Chain 1: 4600 118.622 0.043 0.024
## Chain 1: 4700 151.074 0.063 0.030
## Chain 1: 4800 148.788 0.062 0.024
## Chain 1: 4900 145.882 0.062 0.024
## Chain 1: 5000 148.964 0.062 0.021
## Chain 1: 5100 149.325 0.059 0.020
## Chain 1: 5200 148.311 0.056 0.018
## Chain 1: 5300 150.114 0.055 0.016
## Chain 1: 5400 149.399 0.054 0.015
## Chain 1: 5500 149.456 0.053 0.015
## Chain 1: 5600 146.410 0.032 0.015
## Chain 1: 5700 148.144 0.011 0.012
## Chain 1: 5800 145.647 0.012 0.012
## Chain 1: 5900 150.855 0.013 0.012
## Chain 1: 6000 152.168 0.012 0.012
## Chain 1: 6100 144.875 0.017 0.012
## Chain 1: 6200 147.519 0.018 0.017
## Chain 1: 6300 150.208 0.018 0.018
## Chain 1: 6400 150.932 0.018 0.018
## Chain 1: 6500 151.201 0.019 0.018
## Chain 1: 6600 149.939 0.017 0.017
## Chain 1: 6700 146.972 0.018 0.018
## Chain 1: 6800 146.157 0.017 0.018
## Chain 1: 6900 150.142 0.016 0.018
## Chain 1: 7000 150.781 0.016 0.018
## Chain 1: 7100 149.732 0.011 0.008 MEDIAN ELBO CONVERGED
## Chain 1:
## Chain 1: Drawing a sample of size 1000 from the approximate posterior...
## Chain 1: COMPLETED.
We can also plot the result of the screen:
plot(fit_screen)Sometimes, the bayesynergy function may return with a
warning. Ideally, we don’t want any warnings at all, and they should be
examined closely, as posterior samples could be unreliable. Usually, the
warning will tell the user how to fix the problem at hand, e.g. by
running the chains for longer (set iter higher), or setting
adapt_delta higher. See [https://mc-stan.org/misc/warnings.html] for some general
tips.
Most commonly, the sampler might complain about divergent transitions. The warning will typically look like this:
## Warning: There were 2316 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
This is indicative of a posterior geometry that is tricky to explore.
In the case where there is only a few divergent transitions, the usual
trick is to set adapt_delta to a higher value, i.e. larger
than 0.9 which is the default. This can be done through the
control option in the bayesynergy call:
fit = bayesynergy(y, x, control = list(adapt_delta = 0.99))However, the case above, where there are 2316 divergent transitions, is indicative of a misspecified model. In my experience, this can happen for a few reasons.
lower_asymptotes = FALSE in the call. Unless one is
specifically interested in these parameters, there are no reason to
estimate them – the model fit will typically still be good without
them.We provide a version of the model that is more robust against inevitable outliers in dose=response data. This is done by swapping out two components of the model formulation, the likelihood and the prior for kernel hyperparameters.
For the likelihood, we utilise the log-Pareto-tailed Normal (LPTN) distribution with mean \(\mu\) and standard deviation \(\sigma\), as described in Gagnon, Desgagné, and Bédard (2020) which takes the form \[ f(x)= \begin{cases} \frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right) & \text{if } \vert\frac{x-\mu}{\sigma}\vert \leq \tau \\ \frac{1}{\sigma}\phi(\tau)\frac{\tau}{\vert\frac{x-\mu}{\sigma}\vert}\left(\frac{\log (\tau)}{\log(\vert\frac{x-\mu}{\sigma}\vert)}\right)^{\lambda+1} & \text{if } \vert\frac{x-\mu}{\sigma}\vert > \tau \end{cases} \] where \(\phi\) denotes the standard normal pdf. The parameters \((\tau,\lambda)\) are controlled by the user-specified hyperparameter \(\rho \in (2\Phi(1)-1,1)\) as \[ \tau=\Phi^{-1}((1+\rho)/2), \ \ \ \ \lambda=2(1-\rho)^{-1}\phi(\tau)\tau\log(\tau) \]
The robust likelihood can be utilized by setting
robust = T when calling thebayesynergy
function. The user-specified parameter \(\rho\) can be set by the rho
argument, by default set to \(0.9\).
For the kernel, we recommend utilising the Matérn kernel with a
Penalized Complexity (PC) prior on the kernel hyperparameters. The PC
prior for the Matern covariance was described in Fuglstad et al. (2018), and strongly encourages
small deviations from the null model, i.e. high probability of an
interaction term being zero. The PC prior for the kernel hyperparameters
\((\sigma_f^2,\ell)\) (in the
2-dimensional case) takes the form \[
\pi(\sigma_f^2,\ell)=\tilde{\lambda}_1\tilde{\lambda}_2\ell^{-2}\sigma_f^{-1}\exp\left(-\tilde{\lambda}_1\ell^{-1}-\tilde{\lambda}_2\sigma_f\right),
\] where \((\tilde{\lambda}_1,\tilde{\lambda}_2)\) are
set to reflect prior beliefs \(P(\ell <
\ell_0)=\alpha_1\) and \(P(\sigma_f^2
> \sigma^2_{f0})=\alpha_2\) by \[
\tilde{\lambda}_1=-\log(-\alpha_1)\ell \ \ \ \
\tilde{\lambda}_2=-\frac{\log(\alpha_2)}{\sigma_{f0}^2},
\] where \((\ell_0,\alpha_1,\sigma_{f0}^2,\alpha_2)\)
are hyperparameters that are set by the user. The PC prior can be
enabled by setting pcprior = T when calling the
bayesynergy function, and hyperparameters specified by the
argument pcprior_hypers. By default, we recommend \((\ell_0,\alpha_1,\sigma_{f0}^2,\alpha_2)=(1,0.1,1,0.2)\).